This files contains an example of tuning an XGBoost model with BayesSearchCV.

Load Data

Transformation Pipeline

Model

XGBoostError: XGBoost Library (libxgboost.dylib) could not be loaded on Apple Silicon (ARM)

https://github.com/dmlc/xgboost/issues/6909

pip install --upgrade --force-reinstall xgboost --no-binary xgboost -v

skopt.BayesSearchCV

https://scikit-optimize.github.io/stable/auto_examples/sklearn-gridsearchcv-replacement.html

https://towardsdatascience.com/xgboost-fine-tune-and-optimize-your-model-23d996fab663

max_depth: 3–10 n_estimators: 100 (lots of observations) to 1000 (few observations) learning_rate: 0.01–0.3 colsample_bytree: 0.5–1 subsample: 0.6–1

Then, you can focus on optimizing max_depth and n_estimators. You can then play along with the learning_rate, and increase it to speed up the model without decreasing the performances. If it becomes faster without losing in performances, you can increase the number of estimators to try to increase the performances.

Results

Timings

Best Scores/Params

BayesSearchCV Performance Over Time


Variable Performance Over Time


Scatter Matrix


Variable Performance - Numeric


Variable Performance - Non-Numeric


Individual Variable Performance


Regression on roc_auc Mean

Feature Importance

https://scikit-learn.org/stable/auto_examples/ensemble/plot_forest_importances.html

NOTE: foreign worker seems like it should be important but is ranked last in feature importance.